Interpolation B + tree : an efficient file structure with which to store and search large almost static data sets

نویسنده

  • Hazel Webb
چکیده

Many applications generate very large data sets that are too massive for effective manipulation using conventional main memory techniques. One specific application area associated with DNA (DeoxyriboNucleic Acid) research, is the log that tracks samples through the analysis process. Analyses of this kind can easily produce 10,000,000 records in a short time period. Tracking a single sample through the process involves searching this very large log for the set of records storing the unique bar code associated with the sample. In this paper a specialized external memory data structure is proposed that will provide efficient access to the required records. We examine an adaptation of the Pegasus method for finding the root of a continuous function in conjunction with a suitable external memory data structure. We show that applying the Pegasus method together with a modified B tree provides close to optimal I/O efficiency for executing a set of point queries on large log files.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Processing of Plant Life in Computer Graphics

The role plants play in our daily lives gives synthetically modeled trees and plants particular significance. Vegetation is part of almost all virtual scenes and also in various application domains. The inhomogenous structure of these natural objects yields an enormous geometric complexity, which continues to pose challenges to computer graphics researchers to this day. This thesis presents new...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

E2DR: Energy Efficient Data Replication in Data Grid

Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...

متن کامل

Efficient Algorithms for Clustering and Interpolation of Large Spatial Data Sets

Title of dissertation: EFFICIENT ALGORITHMS FOR CLUSTERING AND INTERPOLATION OF LARGE SPATIAL DATA SETS Nargess Memarsadeghi Doctor of Philosophy, 2007 Dissertation directed by: Professor David M. Mount Department of Computer Science Categorizing, analyzing, and integrating large spatial data sets are of great importance in various areas such as image processing, pattern recognition, remote sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004